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We propose a method that would allow for a rigorous statistical analysis of 
neural responses to natural stimuli, which are non-Gaussian and exhibit strong 
correlations. We have in mind a model in which neurons are selective for a 
small number of stimulus dimensions out of the high dimensional stimulus 
space, but within this subspace the responses can be arbitrarily nonlinear. Ex- 
isting analysis methods are based on correlation functions between stimuli 
and responses, but these methods are guaranteed to work only in the case of 
Gaussian stimulus ensembles. As an alternative to correlation functions, we 
maximize the mutual information between the neural responses and projec- 
tions of the stimulus onto low dimensional subspaces. The procedure can be 
done iteratively by increasing the dimensionality of this subspace. Those di- 
mensions that allow the recovery of all of the information between spikes and 
the full unprojected stimuli describe the relevant subspace. If the dimensional- 
ity of the relevant subspace indeed is small, it becomes feasible to map out the 
neuron's input-output function even under fully natural stimulus conditions. 
These ideas are illustrated in simulations on model visual neurons responding 
to natural scenes. 



1 Introduction 



From olfaction to vision and audition, a growing number of experiments [|l]]-[]8|] are ex- 
amining the responses of sensory neurons to natural stimuli. Observing the full dynamic 
range of neural responses may require using stimulus ensembles which approximate 
those [0, [TIJ occurring in nature, and it is an attractive hypothesis that the neural repre- 
sentation of these natural signals may be optimized in some way [|11|]-[]14|1. Many neurons 
exhibit strongly nonlinear and adaptive responses that are unlikely to be predicted from 
a combination of responses to simple stimuli; in particular neurons have been shown to 



1 



adapt to the distribution of sensory inputs, so that any characterization of these responses 
will depend on context Ql5| , [161 ]. Finally the variability of neural response decreases sub- 
stantially when complex dynamical, rather than static, stimuli are used ]]17]]-1]2(J|]. All of 
these arguments point to the need for general tools to analyze the neural responses to 
complex, naturalistic inputs. 

The stimuli analyzed by sensory neurons are intrinsically high dimensional, with di- 
mensions D ~ 10 2 — 10 3 . For example, in the case of visual neurons, the input is specified 
as light intensity on a grid of at least 10x10 pixels. Each of the presented stimuli can be 
described as a vector s in this high dimensional stimulus space. It is important that stimuli 
need not be pictured as being drawn as isolated points from this space. Thus, if stimuli 
are varying continuously in time we can think of the stimulus s as describing a recent 
window of the stimulus history (e. g., the past K frames of the movie, with dimensional- 
ity K times larger than for the description of a single frame) and then the distribution of 
stimuli -P(s) is sampled along some meandering trajectory in this space; we will assume 
this process is ergodic, so that we can exchange averages over time with averages over 
the true distribution as needed. 

Even though direct exploration of a D ~ 10 2 — 10 3 dimensional stimulus space is 
beyond the constraints of experimental data collection, progress can be made provided 
we make certain assumptions about how the response has been generated. In the simplest 
model, the probability of response can be described by one receptive field (RF) or linear 
filter The receptive field can be thought of as a template or special direction v in 
the stimulus space such that the neuron's response depends only on a projection of a 
given stimulus s onto v, although the dependence of the response on this projection can 
be strongly nonlinear. In this simple model, the reverse correlation method 0, |21[] can be 
used to recover the vector v by analyzing the neuron's responses to Gaussian white noise. 
In a more general case, the probability of the response depends on projections Sj = • s 
of the stimulus s on a set of vectors {ei, e 2 , ... , e n }: 

P(spike|s) = P(spike)/(si, s 2 , s n ), (1) 

where P(spike|s) is the probability of a spike given a stimulus s and P(spike) is the average 
firing rate. Even though the ideas developed below can be used to analyze input-output 
functions / with respect to different neural responses, such as patterns of spikes in time 
[ ^2] , £3]], we choose a single spike as the response of interest. The vectors {<=;} may also 
describe how the time dependence of stimulus s affects the probability of a spike. We will 
call the subspace spanned by the set of vectors {e^} the relevant subspace (RS). 

Equation (0]) in itself is not yet a simplification if the dimensionality n of the RS is 
equal to the dimensionality D of the stimulus space. In this paper we will use the idea of 
dimensionality reduction []T5|, |2"2| |2"3|] and assume that n <C D. The input-output function 
/ in Eq. ([1]) can be strongly nonlinear, but it is presumed to depend only on a small number 
of projections. This assumption appears to be less stringent than that of approximate 
linearity which one makes when characterizing neuron's response in terms of Wiener 
kernels (see, for example, the discussion in Section 2.1.3 of Ref. []^]). The most difficult 
part in reconstructing the input-output function is to find the RS. Note that for n > 1, a 
description in terms of any linear combination of vectors {e^} is just as valid, since we did 
not make any assumptions as to a particular form of nonlinear function /. 
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Once the relevant subspace is known, the probability P (spike |s) becomes a function of 
only few parameters, and it becomes feasible to map this function experimentally, invert- 
ing the probability distributions according to Bayes' rule: 

/««» = y . © 

If stimuli are chosen from a correlated Gaussian noise ensemble, then the neural response 



can be characterized by the spike-triggered covariance method [flq , [221 , |24|] . It can be 
shown that the dimensionality of the RS is equal to the number of nonzero eigenvalues of 
a matrix given by a difference between covariance matrices of all presented stimuli and 
stimuli conditional on a spike. Moreover, the RS is spanned by the eigenvectors asso- 
ciated with the nonzero eigenvalues multiplied by the inverse of the a priori covariance 
matrix. Compared to the reverse correlation method, we are no longer limited to finding 
only one of the relevant directions e^. Both the reverse correlation and the spike-triggered 
covariance method, however, give rigorously interpretable results only for Gaussian dis- 
tributions of inputs. 

In this paper we investigate whether it is possible to lift the requirement for stimuli to 
be Gaussian. When using natural stimuli, which certainly are non-Gaussian, the RS can- 
not be found by the spike-triggered covariance method. Similarly, the reverse correlation 
method does not give the correct RF, even in the simplest case where the input-output 
function in Eq. (P depends only on one projection. However, vectors that span the RS 
clearly are special directions in the stimulus space independent of assumptions about 
P(s). This notion can be quantified by Shannon information, and an optimization prob- 
lem can be formulated to find the RS. We illustrate how the optimization scheme works 
with natural stimuli for model orientation sensitive cells with one and two relevant di- 
rections, much like simple and complex cells found in primary visual cortex. It also is 
possible to estimate average errors in the reconstruction. The advantage of this optimiza- 
tion scheme is that it does not rely on any specific statistical properties of the stimulus 
ensemble, and can thus be used with natural stimuli. 



2 Information as an objective function 



When analyzing neural responses, we compare the a priori probability distribution of all 
presented stimuli with the probability distribution of stimuli which lead to a spike [ J22|] . 
For Gaussian signals, the probability distribution can be characterized by its second mo- 
ment, the covariance matrix. However, an ensemble of natural stimuli is not Gaussian, 
so that neither second nor any other finite number of moments is sufficient to describe 
the probability distribution. In this situation, Shannon information provides the rigorous 
way of comparing two probability distributions. The average information carried by the 
arrival time of one spike is given by 



/spike = J rf D sP(s|spike) log 2 



P(s|spike) 



(3) 
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The information per spike as written in (|3|) is difficult to estimate experimentally, since it 
requires either sampling of the high-dimensional probability distribution P(s| spike) or a 
model of how spikes were generated, i.e. the knowledge of low-dimensional RS. How- 
ever it is possible to calculate / sp ik e in a model-independent way, if stimuli are presented 
multiple times to estimate the probability distribution P(spike|s). Then, 



/ P(spike|s) 
kpikc ~ \ P(spike) bg2 



P(spike|s) 
P(spike) 



(4) 



where the average is taken over all presented stimuli. As discussed in [^3p, this is useful 
in practice because we can replace the ensemble average () s with a time average, and 
P (spike | s) with the time dependent spike rate r{t). Note that for a finite dataset of N 
trials, the obtained value I sp ik e (A0 will be on average larger than J sp ikc(oo), with difference 
~ ^ s ti m uii/ (iVgpike 2 In 2), where A^ stimu ii is the number of different stimuli, and iVgpikc is the 
number of elicited spikes |]25|]. The true value / spi ke can also be found by extrapolating 
to N — ► oo ||2~3| , ^6|] . Measurement of 7 sp i ke in this way provides a model independent 
benchmark against which we can compare any description of the neuron's input-output 
relation. 

Having in mind a model in which spikes are generated according to projection onto a 
low dimensional subspace, we start by projecting all of the presented stimuli on a partic- 
ular direction v in the stimulus space, and form probability distributions 

P v (x|spike) = (5(x — s ■ v)|spike) s , (5) 
P v (x) = (6(x-a-v)) a , (6) 

where (• • • | spike) denotes an expectation value conditional on the occurrence of a spike. 
The information 



I(v) = J dxP v (x | spike) log 2 



P v (re | spike) 
P v (x) 



(7) 



provides an invariant measure of how much the occurrence of a spike is determined by 
projection on the direction v. It is a function only of direction in the stimulus space and 
does not change when vector v is multiplied by a constant. This can be seen by noting that 
for any probability distribution and any constant c, P cv (x) = c^ 1 P v (x/c). When evaluated 
along any vector, I(v) < / sp ikc- The total information J spikc can be recovered along one 
particular direction only if v = e x , and the RS is one dimensional. 

By analogy with (0), one could also calculate information I(vi, v n ) along a set of 
several directions {vi, v n } based on the multi-point probability distributions: 



P vi ,...,v„({^i}|spike) = / Y[5(xi - s • v;) (spike \ , (8) 

\i=l I s 

l\ ,.({■>;}) = (flSixi-B-Vi)) . (9) 



4=1 



If we are successful in finding all of the n directions ej in the input-output relation 
(0), then the information evaluated in this subspace will be equal to the total information 
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hpike- When we calculate information along a set of n vectors that are slightly off from 
the RS, the answer is, of course, smaller than / sp ik e and is initially quadratic in deviations 
<5vj. One can therefore hope to find the RS by maximizing information with respect to n 
vectors simultaneously The information does not increase if more vectors outside the RS 
are included. For uncorrelated stimuli, any vector or a set of vectors that maximizes J(v) 
belongs to the RS. On the other hand, the result of optimization with respect to a number 
of vectors k < n may deviate from the RS if stimuli are correlated. To find the RS, we first 
maximize I(v), and compare this maximum with / sp ik e / which is estimated according to 
(p. If the difference exceeds that expected from finite sampling corrections, we increment 
the number of directions with respect to which information is simultaneously maximized. 

The information /(v) as defined by (0) is a continuous function, whose gradient can 
be computed. We find 

V v -f = J dxP v (x) [{s\x, spike) — (s\x)] ■ 

where 

(six, spike) = -=-— : — - — r / d D ss5(x — s • v)P(slspike), (11) 
-r(x|spike) J 

and similarly for (s|x). Since information does not change with the length of the vector, 
v ■ V v / = (which can also be seen from ( |T0"1 ) directly). 

As an optimization algorithm, we have used a combination of gradient ascent and 
simulated annealing algorithms: successive line maximizations were done along the di- 
rection of the gradient. During line maximizations, a point with a smaller value of infor- 
mation was accepted according to Boltzmann statistics, with probability cx exp[(I(vj + i) — 
J(vj))/T]. The effective temperature T is reduced upon completion of each line maxi- 
mization. 



d P v (x | spike) 
dx P v (x) 



(10) 



3 Results 

We tested the scheme of looking for the most informative directions on model neurons 
that respond to stimuli derived from natural scenes. As stimuli we used patches of black 
and white photos digitized to 8 bits, in which no corrections were made for camera's 
light intensity transformation function. Our goal is to demonstrate that even though the 
correlations present in natural scenes are non-Gaussian, they can be successfully removed 
from the estimate of vectors defining the RS. 



3.1 A model simple cell 

Our first example is based on the properties of simple cells found in the primary visual 
cortex. A model phase and orientation sensitive cell has a single relevant direction e± 
shown in Fig. |I](a). A given frame s leads to a spike if the projection s± = s • e\ reaches a 
threshold value s t in the presence of noise: 

P (spike | s) 



P(spike) 



= f(8 1 ) = (e(8 1 -8 t +0), (12) 
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Figure 1: Analysis of a model simple cell with RF shown in (a). The "exact" spike- 
triggered average v sta is shown in (b). Panel (c) shows an attempt to remove correlations 
according to reverse correlation method, C~ x riorj \ stai ; (d) vector t> max found by maximizing 
information; (e) convergence of the algorithm according to information /(v) and projec- 
tion v - ei as a function of inverse effective temperature T~ x . (f) The probability of a spike 
P(spike|s • Vmax) (crosses) is compared to P(spike|si) used in generating spikes (solid line). 
Parameters a = 0.05(s max — s min ) and s t = 0.8(s max — s min ) [s max and s min are the maximum 
and minimum values of s\ over the ensemble of presented stimuli]. 



where a Gaussian random variable £ of variance a 2 models additive noise, and the func- 
tion 6{x) = 1 for x > 0, and zero otherwise. Together with the RF e.\, the parameters s t for 
threshold and the noise variance a 2 determine the input-output function. 

The spike-triggered average (STA), or reverse correlation function [ZI| ] / shown in 
Fig. |l|(b), is broadened because of spatial correlations present in the stimuli. In a model, 
the effect of noise on our estimate of the STA can be eliminated by averaging the presented 
stimuli weighted with the exact firing rate, as opposed to using a histogram of responses 
to estimate P (spike |s) from a finite set of trials. We have used this "exact" STA, 

= J d D s sP(s|spike) = p 1 J d D sP(s) sP(spike|s), (13) 
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in calculations presented in Fig. |T](bc). If stimuli were drawn from a Gaussian probability 
distribution, they could be decorrelated by multiplying v sta by the inverse of the a priori 
covariance matrix, according to the reverse correlation method, v Gaussian est oc C~p riori v stSj . 
The procedure is not valid for non-Gaussian stimuli and nonlinear input-output func- 
tions (fj). The result of such a decorrelation is shown in Fig. |I](c). It clearly is missing some 
of the structure in the model filter, with projection ex • v Gaussianest ~ 0.14. The discrepancy 
is not due to neural noise or finite sampling, since the "exact" STA was decorrelated; 
the absence of noise in the exact STA also means that there would be no justification for 
smoothing the results of the decorrelation. The discrepancy between the true receptive 
field and the decorrelated STA increases with the strength of nonlinearity in the input- 
output function. 

In contrast, it is possible to obtain a good estimate of the relevant direction ex by max- 
imizing information directly, see panel (d). A typical progress of the simulated annealing 
algorithm with decreasing temperature T is shown in Fig. ^e). There we plot both the in- 
formation along the vector, and its projection on e\. The final value of projection depends 
on the size of the data set, see below. In the example shown in Fig. [I] there were ~ 50, 000 
spikes with average probability of spike ~ 0.05 per frame, and the reconstructed vector 
has projection v max ■ ex ~ 0.9. Having estimated the RF, one can proceed to sample the 
nonlinear input-output function. This is done by constructing histograms for P(s • v max ) 
and P(s ■ -0 max | spike) of projections onto vector -£> max found by maximizing information, 
and taking their ratio, as in Eq. (Q). In Fig. |T|(f) we compare P (spike |s • v max ) (crosses) with 
the probability P(spike|si) used in the model (solid line). 



3.2 Estimated deviation from the optimal direction 

When information is calculated from a finite data set, the vector v which maximizes / will 
deviate from the true RF ex. The deviation 5\ = v — e\ arises because the probability dis- 
tributions are estimated from experimental histograms and differ from the distributions 
found in the limit on infinite data size. For a simple cell, the quality of reconstruction 
can be characterized by the projection v • e\ — 1 — |<5v 2 , where both v and ex are normal- 
ized, and 5v is by definition orthogonal to ex- The deviation <5v ~ A~ l VI, where A is the 
Hessian of information. Its structure is similar to that of a covariance matrix: 

a 1 f i r,r i ■■, \ ( d , P(x|spike)\ 2 . 

A,j ■ = J dxP(x\spike) I — In — -^—^ — \ {{SiSj\x) - {s i \x){s j \x)). (14) 

When averaged over possible outcomes of iV trials, the gradient of information is zero 
for the optimal direction. Here in order to evaluate (<5v 2 ) = Tr[y4 _1 (V/V/ T )v4 _1 ], we 
need to know the variance of the gradient of /. By discretizing both the space of stimuli 
and possible projections x, and assuming that the probability of generating a spike is 
independent for different bins, we estimate (VijV/j) ~ Aij/(N spike ln2). Therefore an 
expected error in the reconstruction of the optimal filter is inversely proportional to the 
number of spikes and is given by: 

l_ v .e 1 ^l( ( 5v 2 )= Tr[A ' 1] (15) 
2 V 1 2iV spike ln2 
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Figure 2: Projection of vector t> max that maximizes information on RF e± is plotted as a 
function of the number of spikes to show the linear scaling in 1/A^ sp ike- In this series of 
simulations, the average probability of a spike ([12]) had parameter values a = 0.1(s max — 
Smm) and s t = 0.6(s max — s mm ). 



In Fig. we plot the average projection of the normalized reconstructed vector v on the 
RF ei, and show that it scales correctly with the number of spikes. 



3.3 A model complex cell 

A sequence of spikes from a model cell with two relevant directions was simulated by 
projecting each of the stimuli on vectors that differ by n/2 in their spatial phase, taken 
to mimic properties of complex cells, as in Fig. [| A particular frame leads to a spike 
according to a logical OR, that is if either si = s ■ e lf —s\, s 2 = s • e 2 , or — s 2 exceeds a 
threshold value s t in the presence of noise. Similarly to (|T2[), 



P (spike | s) 
P{ spike) 



f[si,s 2 ) 



si\-s t -£i) V 0(|s 2 | - s t - £ 2 )> 



(16) 



where £i and £ 2 are independent Gaussian variables. The sampling of this input-output 
function by our particular set of natural stimuli is shown in Fig. ^|(c). Some, especially 
large, combinations of values of si and s 2 are not present in the ensemble. As is well 
known, reverse correlation fails in this case because the spike-triggered average stimulus 
is zero, although with Gaussian stimuli the spike-triggered covariance method would 
recover the relevant dimensions. Here we show that searching for maximally informative 
dimensions allows us to recover the relevant subspace even under more natural stimulus 
conditions. 

We start by maximizing information with respect to one direction. Contrary to the 
result Fig. |T|(e) for a simple cell, one optimal direction recovers only about 60% of the total 
information per spike [Eq. (f|)]. Perhaps surprisingly, because of the strong correlations 
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in natural scenes, even projection onto a random vector in the D ~ 10 3 dimensional stim- 
ulus space has a high probability of explaining 60% of total information per spike. We 
therefore go on to maximize information with respect to two directions. An example of 
the reconstruction of input-output function of a complex cell is given in Fig. [3|. Vectors 
Vi and v 2 that maximize /(vi, v 2 ) are not orthogonal, and are also rotated with respect to 
ei and e 2 . However, the quality of reconstruction is independent of a particular choice of 
basis with the RS. The appropriate measure of similarity between the two planes is the 
dot product of their normals. In the example of Fig. [|, h^ u ^ ■ ri( vi ,v 2 ) ~ 0.8. 



model reconstruction 
(a) e i (d) v i 




Figure 3: Analysis of a model complex cell with relevant direction e\ and e 2 shown in 
(a) and (b). Spikes are generated according to an "OR" input-output function /(si,s 2 ) 
with the threshold s t = 0.6(s max — s min ) and noise variance a = 0.05(s max — s min ). Panel 
(c) shows how the input-output function is sampled by our ensemble of stimuli. Dark 
pixels for large values of s x and s 2 correspond to cases where P(si, s 2 ) = 0. On the right, 
we show vectors Vi and v 2 found by maximizing information J(v 1; v 2 ) together with the 
corresponding input-output function with respect to projections s ■ vi and s ■ v 2 , panel (f). 

Maximizing information with respect to two directions requires a significantly slower 
cooling rate, and consequently longer computational times. However, the expected error 
in the reconstruction, 1 — h^i,e 2 ) ' ™(vi,v 2 )/ follows a NZ^ behavior, similarly to (^5|), and is 
roughly twice that for a simple cell given the same number of spikes. 
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4 Remarks 



In conclusion, features of the stimulus that are most relevant for generating the response 
of a neuron can be found by maximizing information between the sequence of responses 
and the projection of stimuli on trial vectors within the stimulus space. Calculated in this 
manner, information becomes a function of direction in a stimulus space. Those direc- 
tions that maximize the information and account for the total information per response 
of interest span the relevant subspace. This analysis allows the reconstruction of the rele- 
vant subspace without assuming a particular form of the input-output function. It can be 
strongly nonlinear within the relevant subspace, and is to be estimated from experimen- 
tal histograms. Most importantly, this method can be used with any stimulus ensemble, 
even those that are strongly non-Gaussian as in the case of natural images. 
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